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Variable Length Lossless Coding for 
Variational Distance Class: An Optimal 
Merging Algorithm 

Themistoklis Charalambous, Charalambos D. Charalambous and Sergey Loyka 

Abstract 

In this paper we consider lossless source coding for a class of sources specified by the total 
variational distance ball centred at a fixed nominal probability distribution. The objective is to find a 
minimax average length source code, where the minimizers are the codeword lengths - real numbers 
for arithmetic or Shannon codes - wlule the maximizers are the source distributions from the total 
variational distance ball. Firstly, we examine the maximization of the average codeword length by 
converting it into an equivalent optimization problem, and we give the optimal codeword lenghts via 
a waterfiUing solution. Secondly, we show that the equivalent optimization problem can be solved via 
an optimal partition of the source alphabet, and re-normalization and merging of the fixed nominal 
probabilities. For the computation of the optimal codeword lengths we also develop a fast algorithm 
with a computational complexity of order 0{n). 

I. Introduction 

Lossless fixed to variable length source codes are often categorized into problems of known 
source probability distribution and unknown source probability distribution. For known source 
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probability distribution several pay-offs are investigated in the literature, such as the average 
codeword length [1|, the average redundancy of the codeword length [2], the average of an 
exponential function of the codeword length [[3|-[|5|, and the average of an exponential function 
of the redundancy of the codeword length ||5|, [[6|. Huffman type algorithms are also investigated 
for some of these pay-offs [[T|, [|5||, [[6). For the average codeword length pay-off the average 
redundancy is bounded below by zero and above by one. On the other hand, if the true probability 
distribution of the source is unknown and the code is designed solely based on a given nominal 
distribution (which is different than the true distribution), then the increase in the average 
codeword length due to incorrect knowledge of the true distribution is the relative entropy 
between the true distribution and the nominal distribution [1, Theorem 5.4.3]. Such problems 
with unknown probability distribution are often investigated via universal coding and universal 
modeling, and the so-called Minimum Description Length (MDL) principle based on minimax 
techniques, by assuming the true source probability distribution belongs to a pre-specified class of 
source distributions 0, [|7|-pT|, which may be parameterized or non-parameterized. Universal 
codes are often examined under various pay-offs such as average minimax redundancy, maximal 
minimax pointwise redundancy [2|, and variants of them involving the relative entropy between 



the true probability distribution and the nominal probability distribution |10|, [11|. 

In this paper, we investigate lossless variable length codes for a class of source probability 
distributions described by the total variational distance ball, centred at a fixed (a priori) probability 
distribution (nominal), with the radius of the ball varying in the interval [0, 2]. Since this problem 
falls into universal coding and modeling category we formulate it using minimax techniques. 
The formal description of the coding problem which is made precise in the next section, is as 
follows. Given a class of source probability distributions described by the total variation metric 
centered at an a priori or nominal probability distribution fi E P(S) (P(S) the set of probability 
vectors on a finite alphabet set S) having radius i? > is defined by 

The pay-off may be anyone of those mentioned earlier; we consider minimizing the maximum 
of the average codeword lengths defined by 

hR(\\iy)= max 'S^ l(x)iy(x) . (2) 
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Specifically, our main objective is to find a prefix real-valued code length vector 1^^ which 
minimizes the pay-off L/j(l, j/^). 

There are various reasons which motivated to consider the total variational distance class of 
sources ]B^(i?). Below, we describe some of these. Total variational distance can be used to 
define the distance between the empirical distribution of a sequence and the fixed nonminal 
source distribution G P(S) as follows. Given a sequence x" = {xi, X2, . . . , a;„} G S", let 
iy{x; x") denote the empirical distribution of the sequence x" defined by iy{x; x") = ^'■^^^ 
with A^(x|x'^) the number of occurence of x in the sequence x". For e > 0, we call a sequence 
x" e— letter typical with respect to if |z/(x;x") — < e/i(x),Vx G S. The set of all 

such sequences x" satisfying this inequality is called e— letter typical set T^ifJ^) with respect 
to fi. Therefore, the total variational distance between the empirical distribution z/(x; x") and 
fi satisfied the bound ||i/(-;x") — fiWrv < e- Therefore, the total variational ball radius can 
be easily obtained from observing specific sequences. In this respect, ball radius R is easily 
identified, and the larger the value of R the larger the admissible class of source distributions. 
The total variational distance is a true metric, hence it is a measure of difference between two 
distributions. By the properties of the distance metric then — ^tH^-y < 111/117-^ + ||A^||ry = 2, 
hence R is further restricted to the interval [0,2]. The two extreme cases are i? = implying 
ly = H, and R = 2 implying that the support sets of and denoted by supp(i/) and supp(^i), 
respectively, are non-overlapping, that is, supp(^') fl supp(^i) = 0. Moreover, one of the most 
interesting properties of total variational distance ball is that any admissible u G IB^(i?) may not 
be absolutely continuous with respect to u, denoted hy u « fi and defined by n(x) = for 
some X G S then z/(x) = 0. Consequently, admissible distributions u G IB^(i?) can be defined 
on a larger alphabet than the nominal distribution /j,, that is, the support set of maybe a subset 
of S. 

There is an anthology of distances and distance metrics on the space of probability distributions 



which are related to total variational distance p2l , and therefore one can obtain various lower 
and upper bounds on the performance with respect to other classes of sources, based on ([2]). 



Consider for examples, the case when u « /Lt, Vi/ G B^(i?); by Pinsker's inequality [13|, 
||«^ - fJ'Wrv < 2D(i^||Ai), Vj/ G B^(i?),jy G P(E) 



A 



where D(i/||/i) = Xl^es ^i-'^) ^7^) denotes the KuUback-Leibler distance (or relative entropy 
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distance) between v and [i. Thus, Pinsker's inequality implies that the total variational distance 
class is largei[^than the class defined by replacing \\u — ii\\rv by D(fc'||/x). Indeed it is more 
appropriate especially when the probability distributions v and are singular (resp. nearly 
singular) in which case D(^i||/^) = oo (resp. very large), while \ \u — /^||ry < 2. 
The main contributions of this paper are the following. 

1) The pay-off of maximizing the average codeword length over the total variational distance 
ball is transformed into a new optimization problem which is convex with respect to the 
codeword length. 

2) The problem can be solved by convex optimization tools and in a waterfilling-like fash- 
ion (see Theorem [T|), which requires numerical methods and no closed-form solution is 
provided. Note that this waterfilling structure does not belong to the family of watefilling 
solutions for which practical algorithms were proposed by Palomar et al. p6| . 

3) The optimal code corresponding to the new optimization problem is then equivalent to a 
specific partition of the source alphabet, and re-normalization and merging of entries of 
the initial source probability vector, as a function of the radius of the ball R E [0, 2], from 
which the optimal code is derived. An algorithm is presented which computes the weight 
vector u, having a worst case computational complexity of order 0{n). Our approach 
provides a methodology for the solution of such problems and also an approach for this 
new waterfilling structure. 

The paper is organized as follows. In the next section, we formulate the minimax length 



problem and derive its equivalent optimization. In Section III we show that optimization Problem 
[T]can be solved using convex optimization tools and a waterfilling approach. It is then transformed 
to an average coding problem (Problem |2]), which is being solved via a fast algorithm that is 
based on re-normalization of the initial source probabilities according to a merging rule. In 



Section IV illustrative examples demonstrate the validity of the proposed algorithm and provide 
better understanding on the impact of the distance parameter R on the codeword lengths. The 
paper ends with the conclusions in Section |Vj 



'The bound is tight in the sense that the ratio of ©(i^H/i) and Hi/ — hWtv can be arbitrarily close to 1/2 



gg 
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II. Problem Formulation 

Consider a source generating outputs from a finite set of symbols, denoted by S = {xi,X2, . . . , x\y:\} 
of cardinality according to a source probability distribution ly = {/^(x) : a; G S} = 
z/(x2), . . . , z/(x|s|)) . Source symbols are encoded into D—ary codewords (unless specified 
otherwise log(-) = log£)(-)). A code C = {c{x) : a; G S} for symbols in S with image alphabet 
P = {0, 1, 2, — 1} is an injective map c : S — )• V*, where V* is the set of finite sequences 
drawn from V. For x G S each codeword c{x) E 'D*,c E C is identified with a codeword length 
l{x) E Z+, where Z+ is the set of non-negative integers. Thus, a code C for source symbols 
from the alphabet S is associated with the length function of the code / : S — > Z_|_, and a code 
defines a codeword length vector 1 = {/(x) : x G S} = (/(xi), /(X2), . . . , /(x|2|)) G z'^'. If, 
however, the integer constraint is relaxed by admitting real- valued length vectors 1 G ' which 
satisfy the Kraft inequality (i.e., Xl^es — ^^^^ ^ (^+') replaced by 

Such codes give approximate solutions which are less computationally intensive |[T|. 

Suppose the source probability distribution v - henceforth called the true distribution - is 
unknown, while modeling techniques give access to a nominal source probability distribution 

= {/i(x) : X G S} = (/i(xi),/i(x2), . . . ,/i(x|E|)). Having constructed knowledge of the 
nominal source distribution one may construct from empirical data via counting techniques, the 
distance of the two distributions with respect to the total variation norm — ^i||Ty. This will 
provide an estimate of the radius i?, such that \\v — [^Wty and hence, characterize the set 
B^(i?) of all possible true distributions of the source. Subsequently, the source coding problem 
for the class of sources ]B^j(i?) can be defined via minimax techniques as follows. Let P(S) 
denote the set of probability distributions on the alphabet S, and let P/x(S) denote the set of 
nominal probability distributions defined by 

P^(S) = |/^ = (/i(xi), . . . , /x(x|E|)) G M^^l : 

< /i(xi) < /x(xj),Vi > j, (xi,Xj) G S,^/i(x) = l|. 

The precise problem investigated is stated below. 
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Problem 1. Given a fixed nominal distribution n G P/i(S) and distance parameter R G [0,2], 
define the class of source probability distributions by the total variational ball 

and the average codeword length pay-off with respect to the true source probability distribution 
u G B^(i?) C P(S) by 

1.r{\,i,) = Y,K^>{^) ■ (4) 

The objective is to find a prefix code length vector 1^ G ' (satisfying Kraft inequality), which 
minimizes the maximum average codeword length pay-off defined by 

Li^(l, i/^)= max ^ /(x)z/(x) , (5) 

/or all R G [0, 2]. 



The characterization of optimal prefix code length vector 1^^ G m'^' is obtained by first converting 
L/j(l, i/^) into an equivalent pay-off and then use the resulting pay-off to find the optimal code. 

III. Main results 

The objective of this section is twofold. First, to solve Problem [T] using an equivalent pay-off 
for which the optimal prefix code length vector 1^^ G is obtained using a waterfiUing-like 
approach. Second, to find an explicit expression of the maximizing distribution w G B^(i?). 
Subsequently, to derive certain properties of the maximizing distribution and identify how these 
properties are transformed into equivalent properties for the optimal codeword length vector. The 
main goal here is to identify how symbols are merged together, and how the merging changes 
as a function of the parameter R G [0,2], so that the optimal solution is characterized for all 
-R G [0,2]. From these properties the Shannon codeword lengths for Problem [T] will be found. 

A. Equivalent Pay-off and Waterfilling-Like Solution 

Let Msm(S) denote the set of finite signed measures on S. Then, any rj G Msm(E) has a 
Jordan decomposition {^7^, ^7 } such that r] = r]^ — r] , and the total variation of r] is defined 
by ||r7||Tv = ?7+(S) + r7"(S). Define the following subset Mo(S) = jr? G M,^(S) : r7(S) = 
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Oj CT] e M,„(S). For ^ G Mo(S), then ^(S) = 0, which imphes that = ^"(S), and 

hence = ^-(S) = ^fc. Define ^ = jy - e Mo(S). Since 1 G ' are non-negative 

the following inequalities are obtained. 



xes xes zeE 

^/(a;) (e+(a;)-r(a:))+^/(x)Mx 

xes xes a;es 

< max/(a;)^^(S) — min/(a:;)^~(S) + l{x){j,{x) 



max/(x) ^^^^^"^^ — minZ(x) ^^^^^"^^ + l(x)a(x) 

1^1 



= I maxZ(x) — min/(a;)| — — ^ + l{x)fi{x) (6) 

For a given ^ e P^(S) define the set B^(-R) by 

B^(/?) = 1^ e Mo(S) : ^ = - lyGP(S), ||^|| < i?}. (7) 
For any ^ e B^(S) then ^ = (jy - ^)+ - (i/ - = ^+ - 

Moreover, the upper bound in the right hand side of is achieved by G B^(i?) as follows. 



Let 



G = |x G S : l{x) = max{/(x) : x G S} = /maxj, 
Xo G So = jx G S : l{x) = min{/(x) : x G S} = /minj- 



Take 



^-l(^x) = iy\x) ~ ^J,{x) = —(^6:^o{x) - 6:ro{x)^, X E (8) 

where 5j/(x) denotes the point mass distribution concentrated at y G S. This is indeed a signed 
measure with total variation | li/"!" — iJ,\ \tv = R, and Kx)i^'' ~ /^)(^) = f (^^max — ^minj ■ 
Hence, by using ([8]) as a candidate of the maximizing distribution then 

> /(x)z/^(x) = —\ max/(x) — min/(x) \ + / 1{x)ijl{x), (9) 
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where ^"l" satisfies tlie constraint H^^Htv = H*^^ ~ f^Wrv = R- 

Thus, Lj?(l, i/^) in ([5]) is equivalent to pay-off (|9]). At this stage it is clear that Problem [T] is 
equivalent to minimizing (|9]) subject to the Kraft inequality. This problem can be solved by a wide 
variety of convex optimization methods; in the following theorem we provide a waterfilling-like 
solution obtained by the Karush-Kuhn-Tucker theorem. Before we proceed further we discuss 
some generalizations. 

Remark 1. The derivations leading to (|9]) is generic in the sense that it is an optimization of 
a linear functional over the total variational ball, and hence it is applicable to a variety of 
problems. Below, we discuss two generalizations. 

1 ) Theorem [7] holds for countable alphabets S since the derivations do not depend on any 
assumption on the cardinality of S. 

2 ) The derivation leading to (|9]) holds for abstract alphabets, such as complete separable metric 
spaces with the a— algebra of Borel sets in S with the following modifications, u, fi 
are probability measures on S, / is a non-negative bounded continuous function / : S — )■ [0, oo), 
SxeE ^(^)'^('^)' X]a-6E ^(^)''^('^) '^^^ replaced by integrals f^^^l{x)h'{dx), f^^^l{x)fi{dx), and 
the min, max operations are replaced by sup, inf operations (unless S is compact). In this case. 
For any I which is bounded continuous and non-negative, from (|9]) we have: 

I l{x)v\dx) = ^|supZ(x) - inf /(a;)) + / l{x)fi{dx) (10) 
Jt, 2 I ^.gs ^es J J J. 



and 



[ u\dx)=^i{J:') + ^e[0,l], [ z.t(cia;)=MSo)-f e[o,i], 

v\A)=[i[A), VACS\S°UEo (11) 



Moreover, even in this abstract case, the first right hand side term of (10) is related to the 
oscillator semi-norm of I by 

osc{l) = sup |/(a;) — l{y)\ = 2 inf ||Z — a;||oo = sup/(x) — inf l{x) (12) 

Although, generalization 2) is not pursued in this paper, one can infer that the generic result is 
of interest for classes of distributions on abstract alphabets. 
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Theorem 1. Consider pay-off h^i}, u) and real-valued prefix codes. Let w and w such that 

5^(^-Mx))"' = f, (13) 

and 



R 



(14) 



where (/)^ = max(0, /) and R G [0,2]. The distribution v"^ G lB^(i?) which minimizes the 
maximum average codeword length pay-off hji(\, u"^) for all R G [0, 2] is given by 

w if fi{x) > w, 

^K^) = { fi{x) ifw<n{x)<w, (15) 

w if fi{x) < w. 



Proof: See Appendix A- A 



An example of the solution to the coding problem with real valued prefix codes for a total 
variational distance ball is obtained from Theorem [T] and it is depicted in Figure [T] 



weight 



fi{xi) ii{x2) /i(.T3) m(-^4) K-^b) A'(-'':6) fJ-ix?) Symbol 



Fig. 1 . Example demonstrating tiie solution of the coding problem using a watefiUing-like fashion. In the example of the figure, 

l'"'' — {w,W,W, IJ,{X4), fl{x5),W,w}. 



A similar problem is considered in [ 17 1, where the Shannon entropy of an unknown distribution 
is maximized subject to a variational distance constraint between a nominal distribution and the 
unknown distribution. With completely different approach, p7| are able to provide a similar 
solution to the waterfiUing approach described in this section, which however cannot incorporate 
classes of sources on abstract alphabets. 
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B. Optimal Weights and Merging Rule 
The pay-off Lr(1, j/^) can be written as 



^ + I H{x) + ^ j /max + ( - ^ j /min, (16) 

where 

J] u^x) = J2 /^(^) + f e [0, 1], Yl ^^(^) = E '"(^) - f e 

u\x) = fi{x), VxGS\S"USo, < z/"^(x) < 1, VxGS. 

The above expression makes the dependence on the disjoint sets S°, So and S\S°USo explicit. 
The sets remain to be identified so that a solution to the coding problem exists for all R E [0, 2]. 
Note that l^i^, /max and sets S° and Sq depend parametrically on i? G [0,2]. This explicit 
dependence will often be omitted for simplicity of notation. 

Define a = R/2, then Problem [T] becomes equivalent to Problem [2| stated below. 

Problem 2. Given a fixed nominal distribution /j, G P/i(S) and distance parameter a G [0, 1], 
define the pay-off as follows: 

Lo(l, ^l)= ^ l{x)^{x) + j ^ + a j /max + I ^ /"(a^) - a j /mm, (17) 

The objective is to find a prefix code length vector 1^ G ' which minimizes the pay-off'La^^, fi), 
for all a G [0, 1] such that the Kraft inequality holds; i.e., Xlxes D^^^^^ < 1- 

In this section, the optimal real-valued prefix codeword lengths vector 1^^ minimizing pay-off 
L,a{l, fJ') as a function of a G [0, 1] and the initial source probability vector fi, are recursively 
calculated via re-normalization and merging. For any specific a G [0, 1], a fast algorithm (of 
linear complexity in the worst case) is devised which obtains the optimal real-valued prefix 
codeword lengths minimizing pay-off Lq,(1, fi). 
Define 



^ z/„(x) = ^ /i(x) + a G [0, 1], (18a) 
^z/„(x)= ^/i(a;)-aG [0,1], (18b) 
Ua{x) = fxix), x G S \ S° U So. (18c) 
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Using pT] ) and ( [T8] ) the pay-off Lo,(l, /x) is written as a function of the new weight vector as 
follows. 

L„(l, m) = L(l, = J2 ^a{x)lix), a e [0, 1]. (19) 

The new weight vector Ua is a function of a and the source probability vector fi E P^(S), 
and it is defined over the three disjoint sets S°, So and S \ S° U Sq. It can be easily verified 
that < i'a{x) < 1, Vx G {S", So} (if any of the weights was negative, then someone could 
easily choose a very large l{x) and the pay-off La(l,^i) = L(l, i/q,) would be negative) and 

Ex.es^«(^) = l,VaG[0,l]. 

Lemma 1. The real-valued prefix codes minimizing pay-off ha(\, n) for a G [0, 1] are given by 

- log X G S \ So U S° 

where So an<i S° remain to be specified. 



X 



log ( ^^^^^t^^V XGS° (20) 



Proof: See Appendix A-B 



The point to be made regarding Lemma [T] is twofold: (a) since for a G [0, 1] the pay-off Lq,(1, fi) 
is continuous in 1 and the constraint set defined by Kraft inequality is closed and bounded and 
hence compact, an optimal code length vector 1^ exists, and (b) the optimal code is given by 



(20). 



From the characterization of optimal code length vector of Lemma[Tj it follows that La(l^^, fi) = 
— ^^gj. z/a(x) log i^^(x) > M(ua), where M^Va) denotes the entropy of the probability distri- 
bution fi. Equality holds if, and only if, z/q(x) = z/^(x),Vx G S. Therefore, for a G [0,1] 



the weights satisfying (18) and corresponding to the optimal code length vector are uniquely 
represented via i^a = i^l- Further, by rounding up the optimal codeword lengths (i.e., l^{x) = 
[— logz/^(x)]) Kraft inequality remains valid and hence M^Ua) < J2x&t, ^^ix)i'a{x) < EI(i/q) + 1. 
The next lemma describes monotonicity properties of the weight vector Ua as a function of the 
probability vector fi, for all a G [0, 1]. 

Lemma 2. Consider pay-off ha(}, fJ^) and real-valued prefix codes. The following hold: 
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1) For {x,y} C S, if fi{x) < fi{y) then Ua{x) < I'aiy), for all a G [0,1]. Equivalently, 
li{xi) > /i(x2) > . . . > > implies I'aixi) > i'a{x2) > ■ ■ ■ > i^a{xm) > 0, for all 
a e [0, 1]. 

2) For ?/ G S \ So U S", Uaiv) is constant and independent of a & [0, 1]. 

3) For X G S°, Uaix) is a monotonically increasing function of a E [0, 1]. 

4) For X G So, z/q,(x) a monotonically decreasing function of a E [0, 1]. 



Proof: See Appendix A-C 



Next, the merging rule which described how the weight vector Ua changes as a function of 
a E [0, 1] is identified, such that a solution to the coding problem is completely characterized for 
arbitrary cardinalities |S°| and |So|, and not necessarily distinct probabilities, for any a E [0, 1]. 
Clearly, there is a minimum a called ttmax such that for any a E [«max5 1] there is no compression. 

Consider the complete characterization of the solution, as a ranges over [0, 1], for any ini- 
tial probability vector fi (not necessarily consisting of distinct entries). Then, |So| + |S°| G 
{1, 2, . . . , |S| — 1} while for |So| + |S°| = |S|, a E [omax, 1], there is no compression since the 
weights are all equal. 
Define 

13k-, = min {/3 G [0, 1] : z//3(x|s|-(fci-i)) = z^/3(a;|s|-fci)} , /ci G {1, . . . , |S| - 1}, /3o = 0, 

7fc2 = min{7 G [0,1] : z/^(x(fc2_i)) = //^(xfcj} , /cs G {2, . . . , |S| - 1}, 70 = 0, 

ttfc = max{/3fci,7A:2} , k = ki + k2, ao = 0. 

By Lemma [2] the weights are ordered, hence ai is the smallest value of « G [0, 1] for which two 
weights become equal; this can occur because the two smallest weights become equal (/3i < 71), 
or because the two biggest weights become equal (71 < j3i). 

Since for A; = 0, i^aoi^) = ^o{x) = yu(x),Vs G S, is the set of initial symbol probabilities, let 
S°'° denote the singleton set {a;|2|} and So,o denote the singleton set {xi}. Specifically, 

S°'° = \ x E {x|E|} : /i^ = min /i(a;) = h{x\t,\) > , (21) 

So,o = \ x E : yU* = max/i(a;) = yu(xi) \ . (22) 

I xeE J 

Similarly, S°'^ is defined as the set of symbols in {si^i-i, whose weight evaluated at (3i is 

equal to the minimum weight z/^^ and So,i is defined as the set of symbols in {xi,X2} whose 
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weight evaluated at 71 is equal to the maximum weight z/^^ : 



X G {a;|s|-i,a;|s|} : T^pA^) = ^^p^ 
So,i = |x G {xi,X2] : z^7i(a;) = i^^i}- 
In general, for a given value of ak, A; G {1, . . . , |E| — 1}, define 

E°'^i = |x G {x|s|-fei-i,X|s|-fci,...,X|s|} : = z/J^,^ 

So,fc2 = {a; G {a;i...,Xfc2,Xfc2+i} : v^^^{x) = 
and for k = ki + k2, au = max Tfca}- 

Lemma 3. Consider pay-offha(J, i-i) and real-valued prefix codes. For ki,k2 G {0, 1, 2, 
1}, then 

i^l3ix\j:\-k^) = z^/3(a;|s|) = z^^, /? e /3fci+i) C [0,1), 
z/^(xfcj = = z/», 7 G [7fc2>7fc2+i) C [0, 1). 

Further, the cardinality of sets S"''''^ and So,/c2 '■^ (^1 + 1) <^'^d {k2 + 1), respectively. 



(23) 
(24) 

(25) 
(26) 

(27) 
(28) 



Proof: See Appendix A-D ■ 
The next theorem describes how the weight vector changes as a function of a G [0, 1] so 
that the solution of the coding problem can be characterized. 

Theorem 2. Consider pay-off ha(}, fJ^) and real-valued prefix codes. For a G [ak,Oik+i), k G 
{0, 1, . . . , |S| — 1}, optimal weights 

A 



= {^ii^) : a; G S} = (z/1,(xi), //^^(xs), . . . , i^i(x|s|)) , 



are given by 



ul{x) = < 



, X G ZjqJjj; 



(29) 



1 + A;2 
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where 

/3k,+i = {ki + l)/i(x|s|-{fci+i)) - /^*^^)' (^^^ 

7fc2+i = nix) - {k2 + (31) 

ttfc+i = mill 7fc2+i}. (32) 

Moreover, the minimum a, called Omax, that for a G [ttmax, 1] there is no compression, is 
given by 

"max = + - ^ (33) 

w/iere /c]" i5 the number of probabilities fi{x) G S f/zaf are /e^i' ?/zan 1/|S|. 

Proof: The derivation of Theorem |2] is based on the Lemmas introduced prior to Theorem |2j 
By Lemma [3} for a E [a^, a^+i), the lowest probabilities that are equal, change together forming 
a total weight given by 

whereas the highest probabilities that are equal, change together forming a total weight given 
by 



^ Uaix) = \T.o,k2Wl = ^ /i(a;) - a. 



At a = /3fei+i, each weight is equal to and from Lemma |3j we have 

/^(a;|s|-(fci+i)) = 5Z + ^'^1+1 ^ ^^1+1 ^ + l)Ai(a;|sh(A:i+i)) - ^ /"(a^)- 

Similarly, it is shown for a = 7fc2+i that 

7fc2 + l = Yl ~ + l)/i(Xfc2 + l). 

Once we find /3a,j+i and 7fc2+i5 c^fc+i will denote the value of a for which there is merging 
and this will be the smallest between and 7^2+1- The minimum a, called a^aax, such that 

for a E [«max, 1] there is no compression, is obtained when all the weights converge to the 
average probability, i.e. z/^ = We know that this probability will lie between two nominal 
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probabilities whose weights will converge one from above and one from below. Hence, we can 
easily find the maximum cardinalities of S"''^^ and T.o±2 ■ Once, the cardinality is known we can 



use one of the equations for finding /3fci+i and 7^2+1 to find ctmax- Here, we use ( pO] ) and Omax 
can be expressed as follows: 

amax=(A^t + l)^- Yl Ma^)e[0,l]- (34) 



Theorem|2]facilitates the computation of the optimal real-valued prefix codeword lengths vector 1^ 
minimizing pay-off hail, as a function of a G [0, 1] and the initial source probability vector 
fi, via re-normalization and merging. Specifically, the optimal weights are found recursively 
calculating j3k^, ki G {0, 1, ... , |S| — 1} and 7^3, k2 G {0, 1, ... , |S| — 1} and hence ak, k G 
{0, 1, . . . , |S| — 1}. For any specific a G [0, 1] an algorithm is given next, which describes how 
to obtain the optimal real- valued prefix codeword lengths minimizing pay-off fi). 

The main difference between the solutions emerging from Theorems [T] and |2] is the following. 
Theorem [T] simplifies the problem and complexity by boiling the problem down to the numerical 
solution of a waterfiUing equation, while Theorem [2] finds an explicit expression of the weights. 
While both approaches solve the problem. Theorem |2] finds an explicit expression, thus revealing 
several properties of the solution and the impact on a on the optimal real- valued prefix codeword 
lengths. 



C. An Algorithm for Computing the Optimal Weights 

For any probability distribution fi G P(S) and a E [0, 1] an algorithm is presented to compute 
the optimal weight vector Ua of Theorem |2j By Theorem |2] (see also Fig. [2] for a schematic 
representation of the weights for different values of a), the weight vector Ua changes piecewise 
linearly as a function of a G [0, 1]. 
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7 = 71 /? = /3l " = "max a = 1 



a = Oi a = a2 a = 

Weight a e [0, 1) 



Fig. 2. A sciiematic representation of tlie weiglits for different values of a. The weigiit vector i/q clianges piecewise linearly 
as a function of q £ [0, 1]. 



Given a specific value of a G [0, 1], in order to calculate the weights Uaix), it is sufficient to 



determine the values of a at the intersections by using ( [32| ), up to the value of a for which the 
intersection gives a value greater than d, or up to the last intersection (if all the intersections give a 
smaller value of a) at a^aa.x beyond which there is no compression. For example, if ai < d < 0:2, 
find all a's at the intersections up to and including a2 and subsequently, the weights at a can be 



found by using p9| ). Specifically, check first if d > ctmax- If yes, then the weights are equal to 
1/1 A*!- If d < Omaxj then find ai, . . . , a.^, m E N, m > 1, until a^^i < a < am- As soon as the 



a's at the intersections are found, the weights at d can be found by using ( |29| ). The algorithm 
is easy to implement and extremely fast due to its low computational complexity. The worst 
case scenario appears when a\x\-2 < ct < ctmax = 0!\x\-i, in which all a's at the intersections 
are required to be found. In general, the worst case complexity of the algorithm is 0{n). The 
complete algorithm is depicted under Algorithm [T] 
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Algorithm 1 Algorithm for Computing the Weight Vector i/„ 



initialize 

H = {i^{xi),i^{x2), i^{xm))^, a = f /c = 0, /ci = 0, /C2 = 0, /5o = 70 = 
it! 

while CKfe < — do 

^ 

if ^fei+i < 7ifc2+i then 

Qjfc+i = A; + 1, /ci <(- /ci + 1 

else if Pki+i > lk2+i then 

ttfc+i = 7fc2+i' A; A; + 1, A;2 A;2 + 1 
else if Pki+i = 7fc2+i then 

ttfe+i = /5fei+i> Q;ifc+2 = 7fe2+i' A; A; + 2, /ci /ci + 1, A;2 /C2 + 1 
end if 
end while 
if ak = (5ki then 

ki^ ki-l 
else if ccfc = then 

A;2 •<— A;2 — 1 
else 

A^i •<— A;i — 1, A;2 ■<— A;2 — 1 
end if 

for n = 1 to A:2 + 1 do 

J^R [Xn) = —-, , + 1 

2 i + /C2 

end for 

for n = A;2 + 2 to |E| — /ci — 1 do 

2 

end for 

for n = |E| - A;i to |S| do 

i^k (2;n) = , , -, n n + 1 



2 1 + A;i 

end for 

return . 
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IV. Illustrative Examples 
This section presents illustrative examples of the optimal codes derived in this paper. 

A. Illustrative theoretical example 

The following example is introduced to illustrate how the weights v>a and the cardinality of 
the sets So and S° change as a function of a G [0, 1]. 

Consider the special case when the probability vector ^l{x) E P(S) consists of distinct 
probabilities, e.g., that < and fj,{x2) < At(xi). The goal is to characterize 

the weights in a subset of a G [0, 1], such that i'aix\j:\) < i^a{xm-i) and z^q(s2) < i^a{xi) hold. 
Since S° = {x^} = 1) and S„ = {xi} = 1) then 

where the weights are given by Uaix) = x E S \ U Vai^i^) ^ + ce 

and Vaixi) = yu(xi) — a (by Lemma |2j). For any a E [0, 1] such that the condition z/Q,(a;|s|) < 
^'a(a^|s|-i) and i'a{x2) < i^ai^i) hold, the optimal codeword lengths are given by — log x G 

S, and this region of a G [0, 1] for which = 1 and = 1 satisfies the following inequalities 

+ a < yu(a;|E|-i) and fi{xi) - a > fi{x2) (35) 

Equivalently, 

{a G [0, 1] : a < min{/i(x|s|-i) - fJ'{x\j:\), n{xi) - Ai(x2)}} . 

Hence, under the conditions S° = {a;|s|} = 1) and So = {xi} (|So| = 1), the optimal 

codeword lengths are given by — log z/q,(x), x G S for a < ai = min{/i(a;|s|-i)— /i(a;|s|), /i(a;i) — 
fi{x2)}, while for a > ai the form of the minimization problem changes, as more weights Ua^x) 
enter either S° or So, and the cardinality of that set is changed; that is, the partition of S 
into S \ So U S", S° and So is changed. Note that when fJ.{xm) = /i(a;|E|-i), in view of the 
continuity of the weights Ua as a function of a G [0, 1], the above optimal codeword lengths are 
only characterized for the singleton point a = cti = 0, giving the classical codeword lengths. 
For a E (0, 1) the problem should be reformulated. 
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Without loss of generality, and for the sake of simplicity of exposition of this example, suppose 
that fj,{xi) — /i(x2) < /i(a;|E|-i) — A*(a^|E|)- If we now consider the case for which a > ai and 
I Sol = 2 the problem can be written as 

a'6S\SoUi;° xeT. 
For any a G 1) such that the conditions Uaixm) < z/q,(x|s|_i) and Uaixs) < z/q,(x2) hold, 
the optimal codeword lengths are given by —\ogUa{x),x E S and this region is specified by 

{a G [0, 1] : ai < a < min{/i(x|s|-i) - fi{xm), fi{xi) + fi{x2) - 2fi{x3)}} . (36) 

The procedure is repeated and the problem is reformulated until all Uaix) = ij,{x), x G S\SoUS° 
join the sets S° and Sq. Eventually, for large a sets E° and Sq will merge together and l{x) = 

B. Optimal weights for all a G [0, 1] for specific probability distributions 
Consider binary codewords and a source with |S| = 4 and probability distribution 




Using Algorithm [T] one can find the optimal weight vector vj^ for different values of a G [0, 1] for 
which pay-off ( fTT] ) of Problem [2] is minimized. The weights for all a G [0, 1] can be calculated 
iteratively by calculating for all k G {0, 1, 2, 3} and noting that the weights vary linearly with 
a (Figure |3]). 

Weights for different values of a 

0.7 
0.6 

0.5 

>° 0.4 

CO 

i" 0.3 
0.2 
0.1 


0.1 0.2 0.3 0.4 0.5 

Parameter a = R/2 

Fig. 3. A schematic representation of the weights for different values of a when fi — {j^, j^, j^, j^). 
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The first merging occurs when 

ai = mm{/i(a;|E|-i) - fi{xm), fi{xi) - fi{x2)} = min 



1 


8 4 I 


. fl 












15' 


15 ^ 15/ 


= "^^"115' 


15] 



(37) 



7 4 2 2 



For a = ai the optimal weights according to are given by i^ai — ^15, 155 155 i^j- 

Now consider binary codewords and a source with |S| = 5 and probability distribution 



16 A ± A J_ 

31 31 31 31 31 



Using Algorithm [T] one can find the optimal weight vector v'l for different values of « G [0, 1] 
for which pay-off (TT) of Problem [2] is minimized. 



0.7r 

0.6 



Weights for different vaiues of a 



5' 




0.2 0.3 

Parameter a = R/2 



Fig. 4. A schematic representation of the weights for different values of a when fi — (If,;^,^,;^,^)- 



Given the weights, we transformed the problem into a standard average length coding problem, 
in which the optimal codeword lengths can be easily calculated for all a's and they are equal 

to [-log(z/„(x))],Vx e S. 

V. Conclusions 

The solution to a minimax average codeword length lossless coding problem for the class 
of sources described by the total variational ball is presented. First, the problem is transformed 
into an optimization one by finding the expresion of the maximization over the total variational 
ball. Subsequently, we give two solutions to the initial minimax coding problem for the class 
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of sources. The first solution is given in terms a waterfilling with two distinct levels. The 
second solution is given by a procedure based on re-normalization of the fixed nominal source 
probabilities according to a specific merging rule of symbols. Several properties of the solution 
are introduced and an algorithm is presented which computes the minimax codeword lengths. 
Illustrative examples corroborating the performance of the codes are presented. 

Although, we consider the average codeword length, other pay offs can be considered, such as, 
average redundancy, average of exponential function of the redundancy, pointwise redundancy 
etc., without much variation in the method of solution. 



Appendix A 
Proofs 

A. Proof of Theorem [7] 

The problem can be expressed as 

maxminmin |Q;(t — s) + Z(x)/i(a;)|, Vx G S, (38) 

subject to the Kraft inequality and the constraints /(x) < t Vx G S and /(x) > s, Vx G S. 
By introducing real- valued Lagrange multipliers A(x) associated with the constraint /(x) < t, 
Vx G S, o"(x) associated with the constraint /(x) > s, Vx G S, and a real-valued Lagrange 
multiplier r associate with the Kraft inequality, the augmented pay-off is defined by 



l(x) 



La(l, p, A, a, r) = a(t - s) + ^ /(x)yu(x) + t yy^D 

+ ^A(x)(/(x)-t) + 5^a(x)(s-/(x)). 

The augmented pay-off is a convex and differentiable function with respect to 1, t and s. Denote 
the real-valued minimization over 1, t, s, A, a, r by 1^, t\ s\ \\ o'^ and t'^ . By the Karush-Kuhn- 
Tucker theorem, the following conditions are necessary and sufficient for optimality. 
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d 

Lo(l,/i,t, s, A,cr,r)|i=it,A=At,i=it,s=st,<x=at,r=rt = 0,VxgS (39) 



dl{x) 
d 



^^L„(l,/i,t,S, A,(7,r)|i=it,A=At,i=it,s=st,<7=f7t,r=rt = 0, (40) 

d 

— LQ,(l,/i,t,S, A,(T,r)|i=it,A=At,i=it,s=st,<T=(7t,r=rt = 0, (41) 



< 0, (42) 



r 



t 



= 0, (43) 

> 0, (44) 

/■f(s)-tt < 0,VxgS, (45) 

\\x) ■ {l\x) - 1^) = 0,VxgS, (46) 

A"^(x) > 0,VxgS. (47) 

s"f-/"f(x) < 0,VxgS, (48) 

a^{x) ■ {s^ - l^x)) = 0, Vx G S, (49) 

(r\x) > 0,Va;GS. (50) 

Differentiating with respect to 1, the following equation is obtained: 

d 

^^L„(l,p,A,r)|i=it,A=At,t=it,.=.t = l2{x)-r^D--^^^''hog^D + X\x)-a\x) =0,Vs G S, 

(51) 

which after manipulation, it becomes 

D-iH^) = f^(^) + f(^)--'(^\ xGS. (52) 

Differentiating with respect to t and s, the following equations are obtained: 

d 

— L„(l,p, A,r)|i=it,A=At,t=tt,r=rt = « - ^ X\x) = ^ ^ \\x) = «. (53) 

— L«(l,p, A,r)|i=it,A=At,t=it,r=rt = + ^^"^(a;) = ^ JZ^^*^^) " ^^"^^ 



When rt = 0, ([51]) gives ^(x) = at(x) - \^{x),\/x G S. Since (t^{x) = \^{x) = Vx G 



S \ S° U So, then it is concluded that /i(x) = 0. However, ij,{x) > 0, Vx G S \ S° U S^, and 
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therefore, necessarily r^^ > 0. Next, is found by substituting ( [52] ) and ([53]) into the Kraft 
equality to deduce 

^ /i(x) + At(x) -Ort(a;) 

^ rt log^ D 

rtloggD rtloggD rtloggi:) rt log^ D 
Therefore, = ^^^^ ^ . Substituting r^^ into ( [52] ) yields 

D-'^(^) =/i(x) + At(x)-(Tt(x), xgS. (55) 

Let ti'^(a;) = i.e., the probabilities that correspond to the codeword lengths l\x); also, 

let w_ = D^^^ and tl; = D^'^\ From the Karush-Kuhn-Tucker conditions ( [46] ) and ( |47| ) we deduce 
the following. For all x G S \ S° U S^, l{x) < t and /(x) > s; hence A"l'(x) = and a\x) = 0. 
For all X E So, l{x) < t and /(x) = s; hence A"''(x) = and a'^(x) > 0. For all x E S", 
l{x) = t and /(x) > s; hence A^^(x) > and (t^(x) = 0. Therefore, we can distinguish ([55]) in 
the following cases: 

D-'^W=^(x), xgS\S°USo, (56) 
D-'^W = ;x(x) - a\x), X E So, (57) 
D-'^W =/i(x) + A"f(x), xgS°. (58) 

Substituting A^(x) into ( [53] ) we have E^es (-D"'^*-^'^ — = a, and substituting w^(x) = 

D-'-H^) xve get 

(59) 

We know that A^(x) 7^ only when /^(x) = t^; otherwise, t(;^(x) = /i(x). Hence, we can see 
that w^(x) — /i(x) = (w — yu(x))+ and it is positive only when l\x) = . Hence, equation ( |59| ) 
becomes 

^ (w;-/i(x))^ = a, (60) 

where (/)+ = max(0, /). This is the classical waterfilling equation Section 9.4] and w is the 
water-level chosen, as shown in Figure [T] 
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If we also substitute a^{x) into ([53]) we have Xlxes (/^(^) ~ ^ '*'^'^'') = ^'^d substituting 

ly^(x) = Z}"'*(^') we get 



5^ - = a. (61) 

Hence, substituting w = D~^, equation ( |6T] ) becomes 

— w)^ = a. (62) 

Remark 2. A^ofe «Y is possible to handle the case for which fi{x) =0 for some x E T,, in 
exactly the same way. In this case, x e S° and from equation ( [58] ), it is deduced that X^{x) = at 
0, and hence Z}^'^(^) = 0. For a > 0, it is obvious from equation (58) that Z}^'^(^) = A^(x). 



a 



B. Proof of Lemma [7] 

By introducing a real-valued Lagrange multiplier A associated with the constraint the augmented 
pay-off is defined by 

L„(l, H,X)= ^ l{x)n{x) + I ^ /i(x) + a I /max + I ^ l^ix) - a 1 /min 

A j - 1 j . (63) 

\x'6S / 

The augmented pay-off is a convex and differentiable function with respect to 1. Denote the 
real- valued minimization of ([63]) over 1, A by 1^ and A^^. By the Karush-Kuhn-Tucker theorem, 



the following conditions are necessary and sufficient for optimality 

d 



^^^^^U(l,^,A)|i=it,A=At = 0, (64) 

< 0, (65) 



At. fj^Z^-'^W-l') = 0, (66) 
At > 0. (67) 
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Differentiating with respect to 1, when x E S \ S° U So, x G Sq and x G S° the following 
equations are obtained: 

- A"fD~'^(^) log, D = 0, X G S \ S° U S„ (68) 



d 

dl{x 
d 



dl{x 
d 

dl{x 



-Lq,(1, ^, A)|i=it,A=At 
-Lq,(1, ^, A)|i=it,A=At 



^ /i(x) -a - A"^|So|D~''(^Mog,D = 0, xgEo. (69) 

^^(x) + a-A"^|S°|D-'^(^)log,D = 0, xgE". (70) 
When A^^ = 0, ([68]) gives ^.{x) = 0, Vx G S \ S° U S^. Since /i(x) > then necessarily A^ > 0. 



Therefore, ( [68] ), ( [69] ) and ( [70[ ) are equivalent to the following identities: 

/i(x) 



D 



At log, 



X G S \ S° U 
X G Sq, 



At|Eo|log,D 

^ - At|S°|log,D ' 
Next, At is found by substituting ( fTT] ), ([72]) and ( [73] ) into the Kraft equality to deduce: 



(71) 
(72) 
(73) 



/i(x; 



At loe^ D 



At log, D 



^ At|Eo|log,Z} J^^ AtlE^'llog,!) 



+ |So| 



At|E„|log,D 



At|E°|log,D 



At log, D 



X 



At log, D 

Substituting At into([7T]), ([72]) and ([73]) yields 



^-Zt(x) 



/i(x) 



X G E \ E° U Eo 



E 



Finally, from the previous expression one obtains (20). 
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C. Proof of Lemma |2] 

We can show the validity of the statements in Lemma |2] by considering five cases. More 
specifically, 

(i) x,?/ G S \ So U S°: then u^{x) = ^{x) < ^i{y) = u^{y), V a G [0, 1]; 

(ii) x,y e S": Ua{x) = v^iy) = Ka - minj;^^ i^a{x); 

(iii) x,y e So! z/«(x) = Ua{y) = i'a = max^-es i^a{x)\ 

(iv) X G y G S \ So U S° (or X G S \ So U ?/ G S°): consider the case x G S°, 
?/ G S \ So U S°. Then, by taking derivatives 



da 

dVaix) 1 



0, 2/ G S \ So U S°, (74) 
> 0, X G S". (75) 



9a |S°| ' ' 

(v) X G So, ?/ G S \ So U S° (or X G S \ So U S", y G So): consider the case x G So, 
?/ G S \ So U S°. Then, by taking derivatives 



da 

dua {x) 1 



da IS" 



0, y G S \ So U S°, (76) 
< 0, X G So. (77) 



According to (|74]), (|75]), (|76]), (|77]), for a = 0, //„(?/) |q,=o = > z^a(x)|Q,=o = u^x). As a 
function of a; G [0, 1], for y G S \ So U S° the weight z/q,(?/) remains unchanged, for x G S° 
the weight Va{z) increases, and for ,2 G So the weight Va{z) decreases. Hence, since Vai') is a 
continuous function with respect to a, at some a = a', z/q,/(x) = Ua'iv) = Ka'- Suppose that for 
some a = a' + da, da > 0, //^(x) ^ i^aiu)- Then, the lowest weight will increase and the largest 



weight will remain constant as a function of a G [0, 1] according to ( [75| ) and (|74]), respectively. 
We follow similar arguments for z/q,/(x) = Ua'iz) = Va'- 

D. Proof of Lemma |i] 

The validity of the statement is shown by perfect induction. Without loss of generality and 
for simplicity of the proof, suppose that f3i < 71. 

Firstly, for /3 = /3i : Va{x\T.\) = J^a{xm-l) < Z^a(X|s|-2) < • • • < i^a{xi). 
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Suppose that, when a = Pi + da E [0, 1], da > 0, then z/q,(x|s|) 7^ i^ai^m^i). Then, 

L„(l, A*) = (^/i(x|E|) +/i(a;|E|-i) + aj/max+ -aj/min+ ^ 

xes\SoUS° 

and the weights will be of the form Uaix) = for x G S \ So U S°, z^q(x) = /i(a;i) — a for 
X G So and z/q,(x) = + a for x G S°'^ = |x G The rate of change of 

these weights with respect to a is 

dua{x) 



0, X G S \ So U S°, (78) 



da 

1 > 0, ye S°'\ (79) 



da 

Hence, the largest of the two stays constant, while the smallest would increase and therefore they 
meet again. This contradicts the assumption that z/q(x|e|) 7^ ^a{xm-i) for a > (3i. Therefore, 
i'a{x\T.\) = z^a(a;|s|-i), Va G [f3i, 1). 

Similarly, for a > a^, G {2, . . . , |S| — 1}, suppose the weights are 
Then, the pay-off is written as 

Lc,(l,Ai)= ^ /(x)/i(x) + ^ /max+ ^ /i(x)-a /min 

Hence, 

^^1^ = 0, xGS\S°USo, «G(«fc,l), (80) 

|j^oA|^ = 1 > 0, X G S'^-'^S a G K, 1). (81) 
c/a 

Finally, in the case that a > ak+i, A; G {2, . . . , |S| — 2}, if any of the weights ^'^(x), x G S°''^\ 
changes differently than another, then, either at least one probability will become smaller than 
others and give a higher codeword length, or it will increase faster than the others and hence 



according to ( [80| ), it will stay constant to meet the other weights. Therefore, the change in this 
new set of probabilities should be the same, and the cardinality of S°'^^ increases by one, that 

is, is^'^^i] = \h + i| , he {I,... |S| - 2}. 

With similar arguments we prove that weights z/q(x), x G So,fc2 change in the same way and 
the cardinality of So,fc2 increases by one. 
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